Word Order Acquisition from Corpora

نویسندگان

  • Kiyotaka Uchimoto
  • Masaki Murata
  • Qing Ma
  • Satoshi Sekine
  • Hitoshi Isahara
چکیده

In this paper we describe a method of acquiring word order fl'om corpora. Word order is defined as the order of modifiers, or the order of phrasal milts called 'bunsetsu' which depend on the stone modifiee. The method uses a model which automatically discovers what the tendency of the word order in Japanese is by using various kinds of information in and around the target bunsetsus. This model shows us to what extent each piece of information contributes to deciding the word order mid which word order tends to be selected when several kinds of information conflict. The contribution rate of each piece of information in deciding word order is eiIiciently learned by a model within a maximum entropy framework. The performance of this traiimd model can be ewfluated by checking how many instances of word order stletted by the model agree with those in the original text. In this paper, we show t, hat even a raw corpits that has not been tagged can be used to train the model, if it is first analyzed by a parser. This is possible because the word order of the text in the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Induction and Evaluation of Word Order Rules using Corpora based on the Two Concepts of Topological Models

Using dependency trees in natural language generation and machine translation raise the need to derive the word order from dependency trees. This task is difficult for languages with (partly) free word order and comparatively easier for languages with fixed word order. This paper describe (a) the two basic elements of topological models, (b) rule patterns for the mapping of dependency trees to ...

متن کامل

A Word-Order Database for Testing Computational Models of Language Acquisition

An investment of effort over the last two years has begun to produce a wealth of data concerning computational psycholinguistic models of syntax acquisition. The data is generated by running simulations on a recently completed database of word order patterns from over 3,000 abstract languages. This article presents the design of the database which contains sentence patterns, grammars and deriva...

متن کامل

The Acquisition of Word Order in a Topic-prominent Language: Corpus Findings and Experimental Investigation

It is well-known that Chinese has SVO as predominant word order, with variant orders OSV and SOV marking topic and focus, intimately linked to the topic-prominence of the language. Assuming early setting of the head parameter in syntactic acquisition and the peripheral positions of topic and focus in clausal structure, one might hypothesize that Chinese-speaking children will acquire the predom...

متن کامل

Word Order Acquisition in Persian Speaking Children

Objectives: Persian is a pro-drop language with canonical Subject-Object-Verb (SOV) word order. This study investigates the acquisition of word order in Persian-speaking children. Methods: In the present study, participants were 60 Persian-speaking children (30 girls and 30 boys) with typically developing language skills, and aged between 30-47 months. The 30-minute language samples were audio...

متن کامل

The company that words keep: comparing the statistical structure of child- versus adult-directed language.

Does child-directed language differ from adult-directed language in ways that might facilitate word learning? Associative structure (the probability that a word appears with its free associates), contextual diversity, word repetitions and frequency were compared longitudinally across six language corpora, with four corpora of language directed at children aged 1.0 to 5.0, and two adult-directed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000